library(forecast)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
library(ggplot2)
sales <- read.csv("~/Downloads/TOTALSA.csv")
sales_ts <- ts(sales$Sales, start = c(2019,1) , end = c(2024), frequency = 12)
plot(sales_ts, main='Monthly sales for the US Car company', xlab ='Year', ylab='Sales')
A significant drop is observed near 2020, likely due to the COVID. In the later part of the series (2023 onwards), sales appear to stabilize with smaller variations. Sales values fluctuate in a pattern across months. No clear upward or downward trend is evident.
summary(sales_ts)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.944 14.168 15.932 15.602 16.968 18.697
boxplot(sales_ts, main ="Boxplot for Sales")
acf(sales_ts, main = "ACF of Sales")
The strong autocorrelation at early lags suggests persistence in the sales data, where past values influence recent ones. The plot suggests that recent sales are connected to past sales, and there might even be a repeating pattern over time. However, the connection becomes weaker and eventually disappears.
This summary suggests that the sales values are relatively concentrated between 14 and 16, with a mean slightly lower than the median, indicating a slight left skew in the data. There is an outlier, corresponding to the significant drop in sales observed in 2020.
Excluding data before 2022 as there is significant drop during the pandemic and also after 2022 fluctuation is less compared to before 2022.
sales_window_ts <- window(sales_ts, start= c(2022,1), end = c(2024), frequency = 12)
plot(sales_window_ts, main='Monthly sales for the US Car company', xlab ='Year', ylab='Sales')
decompose_sales <- decompose(sales_window_ts)
plot(decompose_sales)
stl_sales <- stl(sales_window_ts, s.window = "periodic" )
plot(stl_sales)
The sales time series is clearly seasonal and shows a consistent repeating pattern, as evidenced by the seasonal component. The upward trend indicates overall growth in sales over time. Some random fluctuations are present, but they are relatively minor compared to the seasonality and trend components.
Yes, the time series is seasonal. There is recurring pattern in sales over time which confirms the seasonality.
The decomposition is additive, the seasonal changes stay about the same over time, meaning their size does not change in proportion to the trend level.
seasonal_indices <- stl_sales$time.series[,"seasonal"]
print(seasonal_indices)
## Jan Feb Mar Apr May Jun
## 2022 0.23925438 0.03832829 0.08788176 0.61561835 -0.40614507 -0.02635658
## 2023 0.23925438 0.03832829 0.08788176 0.61561835 -0.40614507 -0.02635658
## 2024 0.23925438
## Jul Aug Sep Oct Nov Dec
## 2022 -0.04206809 -0.19058274 -0.06309740 0.13656239 -0.05027783 -0.33911719
## 2023 -0.04206809 -0.19058274 -0.06309740 0.13656239 -0.05027783 -0.33911719
## 2024
month_high <- which.max(seasonal_indices)
month_low <- which.min(seasonal_indices)
print(paste("Month with highest seasonal value:", month_high))
## [1] "Month with highest seasonal value: 4"
print(paste("Month with lowest seasonal value:", month_low))
## [1] "Month with lowest seasonal value: 5"
High Seasonal Values Month - April
Low Seasonal Values Month - May
Some manufacturers may launch spring sales events or introduce new models during this time, becasue of which sales are high in April. No Major Sales Events: May typically lacks the significant promotions seen in April.
seasadj(stl_sales)
## Jan Feb Mar Apr May Jun Jul Aug
## 2022 14.62675 14.12967 14.16512 14.06538 13.69215 13.69536 13.90907 14.25458
## 2023 15.42575 15.39567 15.58012 15.83038 16.35415 16.45436 16.34407 16.09958
## 2024 15.27375
## Sep Oct Nov Dec
## 2022 14.19910 14.95444 14.78328 14.27412
## 2023 16.25610 15.66244 15.99528 16.72512
## 2024
plot(sales_window_ts)
lines(seasadj(stl_sales), col ="Red")
The red line represents the seasonally adjusted time series, where the seasonal component has been removed. The difference between the actual and seasonally adjusted series shows the seasonal fluctuations in the sales data. Seasonality have fluctuations in the value of time series.
naive_forecast <- naive(sales_window_ts, h = 12)
plot(naive_forecast)
Naive Model: The simplest model, forecasting that future values will be the same as the last observed value. The forecasted line is constant and flat for the next 12 months.
plot(residuals(naive_forecast) , main = "Residuals from Naïve Method",
ylab = "Residuals",
xlab = "Time")
The residual plot shows a pattern in the residuals, indicating that the Naive method does not capture the seasonality in the data.
hist(residuals(naive_forecast), main = "Residuals from Naïve Method",
xlab = "Residuals")
The histogram of residuals indicates that the residuals are spread around zero, also it is asymmetrical distribution, with some large positive and negative residuals. The residuals are not random, further indicating that the Naive method does not fully capture the underlying structure of the data.
cbind(Fitted = fitted(naive_forecast),
Residuals=residuals(naive_forecast)) %>%
as.data.frame() %>%
ggplot(aes(x=Fitted, y=Residuals)) + geom_point()
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).
The Naïve method does not sufficiently model the underlying structure of the car sales data, particularly its seasonality and trend.
cbind(Data=sales_window_ts,
Residuals=residuals(naive_forecast)) %>%
as.data.frame() %>%
ggplot(aes(x=Data, y=Residuals)) + geom_point()
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).
There seems to be a relationship between the magnitude of actual sales values and residuals, indicating that errors are not consistent across all levels of sales.
Acf(residuals(naive_forecast))
The presence of significant autocorrelation in the residuals confirms that the Naïve method is insufficient for forecasting this time series.
accuracy(naive_forecast)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.02695833 0.6486963 0.5072917 0.08181538 3.379049 0.306364
## ACF1
## Training set -0.393228
forecast_nextYear <- forecast(naive_forecast, h=12)
plot(forecast_nextYear)
print(forecast_nextYear)
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Feb 2024 15.513 14.68166 16.34434 14.24158 16.78442
## Mar 2024 15.513 14.33731 16.68869 13.71494 17.31106
## Apr 2024 15.513 14.07308 16.95292 13.31083 17.71517
## May 2024 15.513 13.85032 17.17568 12.97016 18.05584
## Jun 2024 15.513 13.65407 17.37193 12.67002 18.35598
## Jul 2024 15.513 13.47665 17.54935 12.39867 18.62733
## Aug 2024 15.513 13.31349 17.71251 12.14914 18.87686
## Sep 2024 15.513 13.16162 17.86438 11.91688 19.10912
## Oct 2024 15.513 13.01899 18.00701 11.69874 19.32726
## Nov 2024 15.513 12.88408 18.14192 11.49241 19.53359
## Dec 2024 15.513 12.75576 18.27024 11.29617 19.72983
## Jan 2025 15.513 12.63316 18.39284 11.10867 19.91733
It is forecasting that future values will be the same as the last observed value. The forecasted line is constant and flat for the next 12 months. The predicted sales for each month is 15.513 million, a flat value, which indicates that the Naïve method assumes no change in the sales trend or seasonality.
While the Naïve method performs reasonably well with low MAPE and ME, the high RMSE highlight its limitations in capturing the time series trend and seasonality.
The predicted value for January 2025 is 15.513
The Naive method is too simple for this time series because it fails to capture both the upward trend and the seasonal variations.
simple_ma3 <- ma(sales_window_ts, order = 3)
plot(sales_window_ts, col ="black", lwd =2)
lines(simple_ma3, col='red', lwd =2)
Simple Moving average of order three is more close to actual time series
simple_ma6 <- ma(sales_window_ts, order = 6)
plot(sales_window_ts, col ="black", lwd =2)
lines(simple_ma6, col='blue', lwd =2)
Simple Moving average of order six is relatively smooth than actual time series
simple_ma9 <- ma(sales_window_ts, order = 9)
plot(sales_window_ts, col ="black", lwd =2)
lines(simple_ma9, col='green', lwd =2)
Simple Moving average of order nine is smooth than 3 & 6 to actual time series
plot(sales_window_ts, col ="black", lwd =2)
lines(simple_ma3, col='red', lwd =2)
lines(simple_ma6, col='blue', lwd=2)
lines(simple_ma9, col= 'green',lwd=2)
legend("topleft", legend = c("Original", "SMA(3)", "SMA(6)", "SMA(9)"),
col = c("black", "red", "blue", "green"), lwd =2)
fore_sa <- forecast(simple_ma3, h=12)
## Warning in ets(object, lambda = lambda, biasadj = biasadj,
## allow.multiplicative.trend = allow.multiplicative.trend, : Missing values
## encountered. Using longest contiguous portion of time series
plot(fore_sa, main ='SMA forecast')
summary(fore_sa)
##
## Forecast method: ETS(A,N,N)
##
## Model Information:
## ETS(A,N,N)
##
## Call:
## ets(y = object, lambda = lambda, biasadj = biasadj, allow.multiplicative.trend = allow.multiplicative.trend)
##
## Smoothing parameters:
## alpha = 0.9999
##
## Initial states:
## l = 14.4287
##
## sigma: 0.234
##
## AIC AICc BIC
## 9.221672 10.484830 12.628155
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.06606408 0.2236415 0.1783262 0.4239486 1.196449 0.1003285
## ACF1
## Training set 0.5022638
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2024 15.94801 15.64806 16.24796 15.48928 16.40674
## Feb 2024 15.94801 15.52384 16.37218 15.29930 16.59672
## Mar 2024 15.94801 15.42852 16.46750 15.15352 16.74250
## Apr 2024 15.94801 15.34816 16.54786 15.03062 16.86540
## May 2024 15.94801 15.27736 16.61866 14.92235 16.97367
## Jun 2024 15.94801 15.21336 16.68266 14.82445 17.07156
## Jul 2024 15.94801 15.15450 16.74152 14.73443 17.16158
## Aug 2024 15.94801 15.09971 16.79631 14.65065 17.24537
## Sep 2024 15.94801 15.04825 16.84777 14.57195 17.32407
## Oct 2024 15.94801 14.99958 16.89644 14.49752 17.39850
## Nov 2024 15.94801 14.95329 16.94273 14.42672 17.46930
## Dec 2024 15.94801 14.90906 16.98696 14.35908 17.53694
The Simple Moving average of order three would likely be the best for forecasting this time series as it balances responsiveness and smoothness.
The moving average line becomes smoother.
ses_forecast <- ses(sales_window_ts, h=12)
plot(ses_forecast)
summary(ses_forecast)
##
## Forecast method: Simple exponential smoothing
##
## Model Information:
## Simple exponential smoothing
##
## Call:
## ses(y = sales_window_ts, h = 12)
##
## Smoothing parameters:
## alpha = 0.5689
##
## Initial states:
## l = 14.575
##
## sigma: 0.5968
##
## AIC AICc BIC
## 58.57616 59.71902 62.23279
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.08664673 0.5724014 0.4357888 0.4643479 2.903476 0.263182
## ACF1
## Training set -0.1094298
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Feb 2024 15.80741 15.04261 16.57220 14.63776 16.97705
## Mar 2024 15.80741 14.92751 16.68731 14.46171 17.15310
## Apr 2024 15.80741 14.82580 16.78901 14.30617 17.30864
## May 2024 15.80741 14.73369 16.88112 14.16530 17.44951
## Jun 2024 15.80741 14.64888 16.96593 14.03559 17.57922
## Jul 2024 15.80741 14.56987 17.04494 13.91475 17.70006
## Aug 2024 15.80741 14.49560 17.11921 13.80118 17.81363
## Sep 2024 15.80741 14.42533 17.18948 13.69370 17.92111
## Oct 2024 15.80741 14.35845 17.25636 13.59143 18.02339
## Nov 2024 15.80741 14.29453 17.32028 13.49367 18.12114
## Dec 2024 15.80741 14.23321 17.38160 13.39988 18.21493
## Jan 2025 15.80741 14.17418 17.44063 13.30961 18.30521
The SES forecast is a horizontal line for the future, aa it captures the level of the series without accounting for trend or seasonality
alpha = 0.5689 the model balances recent data and historical data moderately, giving slightly more emphasis to recent observations.
Level (l = 14.575) The start of the forecast period.
Sigma (0.5968) represents the standard deviation of the residuals. The model has moderate error variability, suggesting a reasonably good fit but room for improvement.
plot(ses_forecast$residuals , main = "Residuals from SES Method",
ylab = "Residuals",
xlab = "Year")
It is showing peaks and trough which means data is skewed.
hist(ses_forecast$residuals, main = "Residuals from SES Method",
xlab = "Residuals")
The histogram of residuals indicates that the residuals are spread around zero, also it is asymmetrical distribution, The histogram appears to be skewed on one side.
cbind(Fitted = fitted(ses_forecast),
Residuals=residuals(ses_forecast)) %>%
as.data.frame() %>%
ggplot(aes(x=Fitted, y=Residuals)) + geom_point()
There is some clustering.
cbind(Data = sales_window_ts,
Residuals=residuals(ses_forecast)) %>%
as.data.frame() %>%
ggplot(aes(x=Data, y=Residuals)) + geom_point()
Acf(ses_forecast$residuals)
The observed autocorrelations suggest the SES model fails to account for all patterns.
accuracy(ses_forecast)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.08664673 0.5724014 0.4357888 0.4643479 2.903476 0.263182
## ACF1
## Training set -0.1094298
fore_ses <- forecast(ses_forecast, h=12)
plot(fore_ses)
print(fore_ses)
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Feb 2024 15.80741 15.04261 16.57220 14.63776 16.97705
## Mar 2024 15.80741 14.92751 16.68731 14.46171 17.15310
## Apr 2024 15.80741 14.82580 16.78901 14.30617 17.30864
## May 2024 15.80741 14.73369 16.88112 14.16530 17.44951
## Jun 2024 15.80741 14.64888 16.96593 14.03559 17.57922
## Jul 2024 15.80741 14.56987 17.04494 13.91475 17.70006
## Aug 2024 15.80741 14.49560 17.11921 13.80118 17.81363
## Sep 2024 15.80741 14.42533 17.18948 13.69370 17.92111
## Oct 2024 15.80741 14.35845 17.25636 13.59143 18.02339
## Nov 2024 15.80741 14.29453 17.32028 13.49367 18.12114
## Dec 2024 15.80741 14.23321 17.38160 13.39988 18.21493
## Jan 2025 15.80741 14.17418 17.44063 13.30961 18.30521
The SES forecast is constant over time, reflecting overall average based on recent data, with no upward or downward trend.
The SES model provides reasonable accuracy but falls short due to its inability to account for seasonality and trend, which are evident in car sales data.
he SES model predicts the same value of 15.80741 for every month.
SES provides a simple baseline forecast, it unsuitable for time series data with trend and seasonal components.
hw_forecast <- hw(sales_window_ts, h=12)
plot(hw_forecast)
Holt-Winters model is forecasting both trend and seasonal effects for the next 12 months.
summary(hw_forecast)
##
## Forecast method: Holt-Winters' additive method
##
## Model Information:
## Holt-Winters' additive method
##
## Call:
## hw(y = sales_window_ts, h = 12)
##
## Smoothing parameters:
## alpha = 3e-04
## beta = 3e-04
## gamma = 0.9997
##
## Initial states:
## l = 13.4318
## b = 0.135
## s = -1.2306 -0.0077 0.4442 -0.6829 -0.0744 -0.8751
## -0.4052 -0.8484 1.063 0.249 0.2812 2.0869
##
## sigma: 1.0647
##
## AIC AICc BIC
## 92.06298 179.49155 112.78387
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.0293143 0.6387907 0.4857172 -0.2379943 3.12957 0.2933347
## ACF1
## Training set 0.05388727
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Feb 2024 17.05296 15.68855 18.41736 14.96628 19.13964
## Mar 2024 17.28743 15.92302 18.65183 15.20075 19.37411
## Apr 2024 18.06585 16.70144 19.43025 15.97917 20.15253
## May 2024 17.56794 16.20354 18.93235 15.48126 19.65462
## Jun 2024 18.04771 16.68330 19.41212 15.96103 20.13439
## Jul 2024 17.92124 16.55683 19.28565 15.83456 20.00793
## Aug 2024 17.52771 16.16330 18.89212 15.44102 19.61440
## Sep 2024 17.81103 16.44662 19.17545 15.72434 19.89773
## Oct 2024 17.41666 16.05223 18.78108 15.32995 19.50336
## Nov 2024 17.56237 16.19794 18.92680 15.47565 19.64908
## Dec 2024 18.00282 16.63838 19.36725 15.91609 20.08954
## Jan 2025 17.12986 15.76542 18.49431 15.04313 19.21660
Alpha = 0.0003 The model relies heavily on past data for the level.
beta = 0.0003 Very low, indicating minimal contribution from trend updates.
gamma = 0.9997 Indicates the model heavily emphasizes seasonality.
level l = 13.4318 start point of the time series at the beginning of the forecast period.
Trend b = 0.135 This initial trend value seasonality (s) - values represent the starting seasonal indices for each month
sigma = 1.0647 the standard deviation of residuals. Indicates moderate variability in residual
plot(hw_forecast$residuals , main = "Residuals from hw_forecast Method",
ylab = "Residuals",
xlab = "Time")
Residuals range approximately between -1.5 and 1, which is reasonable and suggests that the model is relatively accurate.
hist(hw_forecast$residuals, main = "Residuals from hw_forecast Method",
xlab = "Residuals")
The histogram is approximately symmetrical, although there is slight variability.
cbind(Fitted = fitted(hw_forecast),
Residuals=residuals(hw_forecast)) %>%
as.data.frame() %>%
ggplot(aes(x=Fitted, y=Residuals)) + geom_point()
The residuals appear randomly distributed across the range of fitted values, which is a good sign of model performance.
cbind(Data = sales_window_ts,
Residuals=residuals(hw_forecast)) %>%
as.data.frame() %>%
ggplot(aes(x=Data, y=Residuals)) + geom_point()
The residuals are evenly distributed across the range of actual values.
Acf(hw_forecast$residuals)
There are no significant spikes in the autocorrelation function.
accuracy(hw_forecast)
## ME RMSE MAE MPE MAPE MASE
## Training set -0.0293143 0.6387907 0.4857172 -0.2379943 3.12957 0.2933347
## ACF1
## Training set 0.05388727
hw_fore <- forecast(hw_forecast, h=12)
plot(hw_fore)
print(hw_fore)
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Feb 2024 17.05296 15.68855 18.41736 14.96628 19.13964
## Mar 2024 17.28743 15.92302 18.65183 15.20075 19.37411
## Apr 2024 18.06585 16.70144 19.43025 15.97917 20.15253
## May 2024 17.56794 16.20354 18.93235 15.48126 19.65462
## Jun 2024 18.04771 16.68330 19.41212 15.96103 20.13439
## Jul 2024 17.92124 16.55683 19.28565 15.83456 20.00793
## Aug 2024 17.52771 16.16330 18.89212 15.44102 19.61440
## Sep 2024 17.81103 16.44662 19.17545 15.72434 19.89773
## Oct 2024 17.41666 16.05223 18.78108 15.32995 19.50336
## Nov 2024 17.56237 16.19794 18.92680 15.47565 19.64908
## Dec 2024 18.00282 16.63838 19.36725 15.91609 20.08954
## Jan 2025 17.12986 15.76542 18.49431 15.04313 19.21660
The Holt-Winters additive method is a time series forecasting model that accounts for level, trend, and seasonality.
The Holt-Winters model fits the data well, capturing trend and seasonality, with only minor residual patterns suggesting opportunities for refinement.
The model forecasts 17.12986 for January 2025, with increasing uncertainty reflected in the widening confidence intervals.
The Holt-Winters additive method provides a well-rounded forecast, capturing the trend and seasonal patterns observed in the data. The model accurately predicts seasonal peaks and troughs: Peak: April 2024 (18.06585). Trough: August 2024 (17.52771).
library(tseries)
adf.test(sales_window_ts)
##
## Augmented Dickey-Fuller Test
##
## data: sales_window_ts
## Dickey-Fuller = -1.3903, Lag order = 2, p-value = 0.8046
## alternative hypothesis: stationary
Time series data is non stationary. I used adf test to verify Since the p-value (0.8046) is greater than 0.0, it is non stationary.
diff_sales <- diff(sales_window_ts)
adf.test(diff_sales)
##
## Augmented Dickey-Fuller Test
##
## data: diff_sales
## Dickey-Fuller = -2.0302, Lag order = 2, p-value = 0.5609
## alternative hypothesis: stationary
diff_sales_2 <- diff(diff_sales)
adf.test(diff_sales_2)
## Warning in adf.test(diff_sales_2): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: diff_sales_2
## Dickey-Fuller = -4.6333, Lag order = 2, p-value = 0.01
## alternative hypothesis: stationary
tsdisplay(diff_sales_2)
Two difference are needed to make it stationay
Yes, seasonal component is needed.
ndiff_sales <- ndiffs(sales_window_ts)
ndiff_sales_2 <- ndiffs(ndiff_sales)
plot(diff_sales_2, main = "Differenced Time Series with regular difference")
plot(ndiff_sales_2, main = "Differenced Time Series with seasonal difference")
acf(diff_sales_2, main = "ACF of Differenced Series")
pacf(diff_sales_2, main = "PACF of Differenced Series")
The ACF and PACF plots suggest the time series is influenced by recent past values, and the influence does not persist for long.
Based on the ACF and PACF patterns:
ARIMA(1, 1, 0) (AR(1) with first differencing and no MA component). ARIMA(0, 1, 1) (MA(1) with first differencing and no AR component). ARIMA(1, 1, 1) (both AR(1) and MA(1) with first differencing).
sales_arima_model <- auto.arima(sales_window_ts, trace = TRUE, stepwise = FALSE)
##
## ARIMA(0,1,0) : 49.51692
## ARIMA(0,1,0) with drift : 51.86504
## ARIMA(0,1,1) : 47.07946
## ARIMA(0,1,1) with drift : 49.0446
## ARIMA(0,1,2) : 47.8305
## ARIMA(0,1,2) with drift : 50.38674
## ARIMA(0,1,3) : Inf
## ARIMA(0,1,3) with drift : Inf
## ARIMA(0,1,4) : Inf
## ARIMA(0,1,4) with drift : Inf
## ARIMA(0,1,5) : Inf
## ARIMA(0,1,5) with drift : Inf
## ARIMA(1,1,0) : 47.62492
## ARIMA(1,1,0) with drift : 49.93837
## ARIMA(1,1,1) : 49.25101
## ARIMA(1,1,1) with drift : 51.52018
## ARIMA(1,1,2) : 50.58533
## ARIMA(1,1,2) with drift : 53.55119
## ARIMA(1,1,3) : Inf
## ARIMA(1,1,3) with drift : Inf
## ARIMA(1,1,4) : Inf
## ARIMA(1,1,4) with drift : Inf
## ARIMA(2,1,0) : 47.27315
## ARIMA(2,1,0) with drift : 49.1998
## ARIMA(2,1,1) : Inf
## ARIMA(2,1,1) with drift : Inf
## ARIMA(2,1,2) : Inf
## ARIMA(2,1,2) with drift : Inf
## ARIMA(2,1,3) : Inf
## ARIMA(2,1,3) with drift : Inf
## ARIMA(3,1,0) : 46.10032
## ARIMA(3,1,0) with drift : 48.9853
## ARIMA(3,1,1) : 49.30433
## ARIMA(3,1,1) with drift : 52.4875
## ARIMA(3,1,2) : Inf
## ARIMA(3,1,2) with drift : Inf
## ARIMA(4,1,0) : 49.30772
## ARIMA(4,1,0) with drift : 52.49161
## ARIMA(4,1,1) : Inf
## ARIMA(4,1,1) with drift : Inf
## ARIMA(5,1,0) : 52.90824
## ARIMA(5,1,0) with drift : 56.53324
##
##
##
## Best model: ARIMA(3,1,0)
summary(sales_arima_model)
## Series: sales_window_ts
## ARIMA(3,1,0)
##
## Coefficients:
## ar1 ar2 ar3
## -0.4321 -0.1276 0.3964
## s.e. 0.1881 0.2025 0.1831
##
## sigma^2 = 0.2879: log likelihood = -18
## AIC=44 AICc=46.1 BIC=48.71
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.0445009 0.491731 0.3732449 0.2239507 2.507289 0.2254104
## ACF1
## Training set 0.0109302
The best model is ARIMA(3,1,0) with an AIC of 46.10. This means the model includes: 3 AR (autoregressive) terms, 1 difference to make the series stationary, No MA (moving average) terms.
models <- list(
"ARIMA(0,1,0)" = Arima(sales_ts, order = c(0,1,0)),
"ARIMA(0,1,1)" = Arima(sales_ts, order = c(0,1,1)),
"ARIMA(1,1,0)" = Arima(sales_ts, order = c(1,1,0)),
"ARIMA(1,1,1)" = Arima(sales_ts, order = c(1,1,1)),
"ARIMA(3,1,0)" = Arima(sales_ts, order = c(3,1,0))
)
# Extract and display AIC, BIC, and Sigma^2
results <- data.frame(
Model = names(models),
AIC = sapply(models, AIC),
BIC = sapply(models, BIC),
Sigma2 = sapply(models, function(m) m$sigma2)
)
print(results)
## Model AIC BIC Sigma2
## ARIMA(0,1,0) ARIMA(0,1,0) 193.2374 195.3318 1.418240
## ARIMA(0,1,1) ARIMA(0,1,1) 194.7002 198.8888 1.429097
## ARIMA(1,1,0) ARIMA(1,1,0) 194.8718 199.0605 1.433371
## ARIMA(1,1,1) ARIMA(1,1,1) 193.4999 199.7830 1.374762
## ARIMA(3,1,0) ARIMA(3,1,0) 197.0734 205.4507 1.438264
ARIMA(3,1,0) AIC - 44 BIC - 48.71 sigma - 0.2879
y t −y t−1 =−0.4321⋅(y t−1 −y t−2 )−0.1276⋅(y t−2 −y t−3 )+0.3964⋅(y t−3 −y t−4 )+ϵ t
plot(sales_arima_model$residuals , main = "Residuals from Arima Method",
ylab = "Residuals",
xlab = "Time")
hist(sales_arima_model$residuals, main = "Residuals from Arima Method",
xlab = "Residuals")
cbind(Fitted = fitted(sales_arima_model),
Residuals=residuals(sales_arima_model)) %>%
as.data.frame() %>%
ggplot(aes(x=Fitted, y=Residuals)) + geom_point()
cbind(Data = sales_window_ts,
Residuals=residuals(sales_arima_model)) %>%
as.data.frame() %>%
ggplot(aes(x=Data, y=Residuals)) + geom_point()
Acf(sales_arima_model$residuals)
accuracy(sales_arima_model)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.0445009 0.491731 0.3732449 0.2239507 2.507289 0.2254104
## ACF1
## Training set 0.0109302
sales_arima_forecast <- forecast(sales_arima_model, h = 12)
plot(sales_arima_forecast, main = "ARIMA Forecast for Next 1 Year", xlab = "Time", ylab = "Sales")
print(sales_arima_forecast)
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Feb 2024 15.89188 15.20430 16.57946 14.84031 16.94344
## Mar 2024 16.01435 15.22365 16.80506 14.80507 17.22363
## Apr 2024 15.56700 14.66640 16.46761 14.18965 16.94436
## May 2024 15.89490 14.73934 17.05046 14.12762 17.66217
## Jun 2024 15.85883 14.60965 17.10800 13.94838 17.76928
## Jul 2024 15.65524 14.28584 17.02464 13.56092 17.74955
## Aug 2024 15.87781 14.35258 17.40303 13.54518 18.21044
## Sep 2024 15.79330 14.18384 17.40276 13.33184 18.25476
## Oct 2024 15.72071 14.00226 17.43917 13.09256 18.34886
## Nov 2024 15.85110 14.02149 17.68071 13.05295 18.64925
## Dec 2024 15.77051 13.86200 17.67902 12.85170 18.68932
## Jan 2025 15.75993 13.75584 17.76401 12.69494 18.82492
sales_forecast_2yr <- forecast(sales_arima_model, h = 24)
plot(sales_forecast_2yr, main = "ARIMA Forecast for Next 2 Years", xlab = "Time", ylab = "Sales")
print(sales_forecast_2yr)
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Feb 2024 15.89188 15.20430 16.57946 14.84031 16.94344
## Mar 2024 16.01435 15.22365 16.80506 14.80507 17.22363
## Apr 2024 15.56700 14.66640 16.46761 14.18965 16.94436
## May 2024 15.89490 14.73934 17.05046 14.12762 17.66217
## Jun 2024 15.85883 14.60965 17.10800 13.94838 17.76928
## Jul 2024 15.65524 14.28584 17.02464 13.56092 17.74955
## Aug 2024 15.87781 14.35258 17.40303 13.54518 18.21044
## Sep 2024 15.79330 14.18384 17.40276 13.33184 18.25476
## Oct 2024 15.72071 14.00226 17.43917 13.09256 18.34886
## Nov 2024 15.85110 14.02149 17.68071 13.05295 18.64925
## Dec 2024 15.77051 13.86200 17.67902 12.85170 18.68932
## Jan 2025 15.75993 13.75584 17.76401 12.69494 18.82492
## Feb 2025 15.82647 13.73363 17.91931 12.62575 19.02719
## Mar 2025 15.76712 13.59954 17.93469 12.45210 19.08213
## Apr 2025 15.78008 13.52852 18.03164 12.33661 19.22355
## May 2025 15.80843 13.48050 18.13636 12.24818 19.36869
## Jun 2025 15.77100 13.37239 18.16960 12.10264 19.43935
## Jul 2025 15.78869 13.31524 18.26215 12.00587 19.57152
## Aug 2025 15.79706 13.25483 18.33929 11.90906 19.68506
## Sep 2025 15.77635 13.16748 18.38521 11.78644 19.76626
## Oct 2025 15.79125 13.11453 18.46796 11.69756 19.88493
## Nov 2025 15.79077 13.05047 18.53106 11.59985 19.98169
## Dec 2025 15.78086 12.97781 18.58392 11.49396 20.06776
## Jan 2026 15.79111 12.92556 18.65667 11.40862 20.17360
The ARIMA (Auto-Regressive Integrated Moving Average) model is a widely used method for time series forecasting that combines autoregression (AR), differencing (I), and moving averages (MA) to capture patterns in the data.
The model provides low error metrics, confirming good accuracy
Predicted sales for the next 12 months show a continuation of historical trends and seasonal patterns. Sales fluctuate between 15–18 units, depending on seasonality The forecast to 24 months, preserving the identified trend and seasonality.
Captures the trend and seasonality in the data effectively after differencing.
accuracy_naive <- accuracy(naive_forecast)
accuracy_avg <- accuracy(ses_forecast)
accuracy_hw <- accuracy(hw_forecast)
accuracy_arima <- accuracy(sales_arima_model)
accuracy_table <- data.frame(
Model = c("Naive", "Average", "Arima", "Holt-Winters"),
RMSE = c(accuracy_naive["Training set", "RMSE"], accuracy_avg["Training set", "RMSE"], accuracy_arima["Training set", "RMSE"], accuracy_hw["Training set", "RMSE"]),
MAPE = c(accuracy_naive["Training set", "MAPE"], accuracy_avg["Training set", "MAPE"], accuracy_arima["Training set", "MAPE"], accuracy_hw["Training set", "MAPE"]),
MAE = c(accuracy_naive["Training set", "MAE"], accuracy_avg["Training set", "MAE"], accuracy_arima["Training set", "MAE"], accuracy_hw["Training set", "MAE"]),
ME = c(accuracy_naive["Training set", "ME"], accuracy_avg["Training set", "ME"], accuracy_arima["Training set", "ME"], accuracy_hw["Training set", "ME"]),
MASE = c(accuracy_naive["Training set", "MASE"], accuracy_avg["Training set", "MASE"], accuracy_arima["Training set", "MASE"], accuracy_hw["Training set", "MASE"]),
MPE = c(accuracy_naive["Training set", "MPE"], accuracy_avg["Training set", "MPE"], accuracy_arima["Training set", "MPE"], accuracy_hw["Training set", "MPE"]))
accuracy_table$Model_Rank_RMSE <- rank(accuracy_table$RMSE)
accuracy_table$Model_Rank_MAE <- rank(accuracy_table$MAE)
accuracy_table$Model_Rank_MAPE <- rank(accuracy_table$MAPE)
accuracy_table$Model_Rank_ME <- rank(accuracy_table$ME)
accuracy_table$Model_Rank_MAPE <- rank(accuracy_table$MAPE)
accuracy_table$Model_Rank_MPE <- rank(accuracy_table$MPE)
print(accuracy_table)
## Model RMSE MAPE MAE ME MASE MPE
## 1 Naive 0.6486963 3.379049 0.5072917 0.02695833 0.3063640 0.08181538
## 2 Average 0.5724014 2.903476 0.4357888 0.08664673 0.2631820 0.46434786
## 3 Arima 0.4917310 2.507289 0.3732449 0.04450090 0.2254104 0.22395074
## 4 Holt-Winters 0.6387907 3.129570 0.4857172 -0.02931430 0.2933347 -0.23799425
## Model_Rank_RMSE Model_Rank_MAE Model_Rank_MAPE Model_Rank_ME Model_Rank_MPE
## 1 4 4 4 2 2
## 2 2 2 2 4 4
## 3 1 1 1 3 3
## 4 3 3 3 1 1
Naive Method: Assumes the next value will be equal to the last observed value. Serves as a baseline to evaluate the performance of more sophisticated models.
Simple Exponential Smoothing (SES): Assigns exponentially decreasing weights to past observations, with a smoothing parameter. Best for data without trend or seasonality. Captures the level of a time series and adapts to changes. Cannot handle trend or seasonality
Holt-Winters Method: Captures complex time series with seasonality and trend. Best for data with both trend and seasonality. Captures complex time series with seasonality and trend.
ARIMA (Auto-Regressive Integrated Moving Average): Combines autoregression (AR), differencing (I), and moving averages (MA) to model time series. Effective for data with complex patterns, including trends and seasonality
Metric Best Model Worst Model RMSE ARIMA (0.4917) Naïve (0.6487) MAE ARIMA (0.3732) Naïve (0.5073) MAPE ARIMA (2.5073%) Naïve (3.3790%) ME Holt-Winters (-0.0293) Average (0.0866) MASE ARIMA (0.2254) Naïve (0.3064) MPE Holt-Winters (-0.2380%) Average (0.4643%)
Best Model: ARIMA is the most accurate across all key metrics (RMSE, MAE, MAPE, MASE) and should be the primary choice for forecasting.
Worst Model: Naive, as it cannot handle complex patterns, making it suitable only as a baseline.
The time series is expected to maintain its seasonal patterns of recurring highs and lows while showing a gradual upward trend in sales over time.
The value of the time series is expected to increase slightly due to a gradual upward trend observed in the data. Seasonal fluctuations will be there.
The overall growth will remain stable, without dramatic changes in the trend or variability.
print(accuracy_table)
## Model RMSE MAPE MAE ME MASE MPE
## 1 Naive 0.6486963 3.379049 0.5072917 0.02695833 0.3063640 0.08181538
## 2 Average 0.5724014 2.903476 0.4357888 0.08664673 0.2631820 0.46434786
## 3 Arima 0.4917310 2.507289 0.3732449 0.04450090 0.2254104 0.22395074
## 4 Holt-Winters 0.6387907 3.129570 0.4857172 -0.02931430 0.2933347 -0.23799425
## Model_Rank_RMSE Model_Rank_MAE Model_Rank_MAPE Model_Rank_ME Model_Rank_MPE
## 1 4 4 4 2 2
## 2 2 2 2 4 4
## 3 1 1 1 3 3
## 4 3 3 3 1 1
Focus on Promotions in High-Sales Months: Target months like April, where sales are high Use discounts, financing offers or deals to counter the usual sales dip in May.